Approximate String Searching under Weighted Edit Distance

نویسنده

  • Stefan Kurtz
چکیده

Let p 2 be a string of length m and t 2 be a string of length n. The approximate string searching problem is to nd all approximate matches of p in t having weighted edit distance at most k from p. We present a new method that preprocesses the pattern into a DFA which scans t online in linear time, thereby recognizing all positions in t where an approximate match ends. We show how to reduce the exponential preprocessing eeort and propose two practical algorithms. The rst algorithm constructs the states of the DFA up to a certain depth r 1. It runs in O(jj r+1 m + q m + n) time and O(jj r+1 + jj r m) space where q n decreases as r increases. The second algorithm constructs the transitions of the DFA when they are demanded. It runs in O(qs jj+qt m+n) time and O(qs (jj+m)) space where qs qt n depend on the problem instance. Practical measurements show that our algorithms work well in practice and beat previous methods for problems of interest in molecular biology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Restricted Transposition Invariant Approximate String Matching Under Edit Distance

Let A and B be strings with lengths m and n, respectively, over a finite integer alphabet. Two classic string mathing problems are computing the edit distance between A and B, and searching for approximate occurrences of A inside B. We consider the classic Levenshtein distance, but the discussion is applicable also to indel distance. A relatively new variant [8] of string matching, motivated in...

متن کامل

A Windowed Weighted Approach for Approximate Cyclic String Matching

A method for measuring dissimilarities between cyclic strings is introduced. It computes a weighted mean between two (lower and upper) bounds of the exact cyclic edit distance, which are founded on a window-constrained edit graph related to the strings involved. Weights are the ones which minimize the sum of squared relative errors of the weighted solution with respect to exact values, on a tra...

متن کامل

Practical Methods for Approximate String Matching

Given a pattern string and a text, the task of approximate string matching is to find all locations in the text that are similar to the pattern. This type of search may be done for example in applications of spelling error correction or bioinformatics. Typically edit distance is used as the measure of similarity (or distance) between two strings. In this thesis we concentrate on unit-cost edit ...

متن کامل

A Best-First Anagram Hashing Filter for Approximate String Matching with Generalized Edit Distance

This paper presents an efficient method for approximate string matching against a lexicon. We define a filter that for each source word selects a small set of target lexical entries, from which the best match is then selected using generalized edit distance, where edit operations can be assigned an arbitrary weight. The filter combines a specialized hash function with best-first search. Our wor...

متن کامل

The Generalized Approximate Regularities in Strings

We concentrate on the generalized string regularities and study the minimum approximate λ-cover problem and the minimum approximate λ-seed problem of a string. Given a string x of length n and an integer λ, the minimum approximate λ-cover (resp. seed) problem is to find a set of λ substrings each of equal length that covers x (resp. a superstring of x) with the minimum error, under a variety of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996